Search CORE

9 research outputs found

Recommended from our members

Harmonic scheduling of linear recurrences in digital filter design

Author: Dutt Nikil
Nicolau Alexandru
Wang Haigeng
Publication venue: eScholarship, University of California
Publication date: 14/02/1992
Field of study

Linear difference equations involving recurrences are fundamental equations that describe many important signal processing applications. For many high sample rate digital filter applications, we need to effectively parallelize the linear difference equations used to describe digital filters - a difficult task due to the recurrences inherent in the data dependences. We present a novel approach, Harmonic Scheduling, that exploits parallelism in these recurrences beyond loop-carried dependencies, and which generates optimal schedules for parallel evaluation of linear difference equations with resource constraints. This approach also enables us to derive a parallel schedule with minimum control overhead, given an execution time with resource constraints. We also present a Harmonic Scheduling algorithm that generates optimal schedules for digital filters described by second-order difference equations with resource constraints

eScholarship - University of California

Optimal Schedules for Parallel Prefix Computation with Bounded Resources

Author: Alexandru Nicolau
Haigeng Wang
Publication venue: ACM Press
Publication date: 01/01/1991
Field of study

Given x 1 ; . . . ; xN , parallel prefix computes x 1 ffi x 2 ffi . . . ffi x k , for 1 k N , with associative operation ffi. We show optimal schedules for parallel prefix computation with a fixed number of resources p 2 for a prefix of size N p(p + 1)=2 . The time of the optimal schedules with p resources is d2N=(p + 1)e for N p(p + 1)=2, which we prove to be the strict lower bound(i.e., which is what can be achieved maximally). We then present a pipelined form of optimal schedules with d2N=(p + 1)e + d(p 0 1)=2e 0 1 time, which takes a constant overhead of d(p 0 1)=2e time more than the optimal schedules. Parallel prefix is an important common operation in many algorithms including the evaluation of polynomials, general Hornor expressions, carry look-ahead circuits and ranking and packing problems. A most important application of parallel prefix is loop parallelizing transformation. 1 Introduction Given x 1 ; . . . ; xN , parallel prefix computes x 1 ffi x 2 ffi . . . ffi x..

CiteSeerX

Speedup of Band Linear Recurrences in the Presence of Resource Constraints

Author: Alexandru Nicolau
Haigeng Wang
Publication venue
Publication date: 01/01/1992
Field of study

An m-th order linear recurrence system of N equations computes x i = c i + P j=i0m i01 a ij x j for 1 i N . Linear recurrences have a role of central importance in computer design, numerical analysis, program analysis, digital signal processing and many non-numerical algorithms. However, programs containing band linear recurrences are difficult to significantly parallelize due to loop-carried dependences. We present a new method for systematically approaching the optimal parallel schedules for computing mth-order linear recurrences with a fixed number of processors p independent of problem size N . Using our method, we first derive two kinds of parallel schedules, called the pipelined schedules and the exact schedules, for parallel evaluation of band linear recurrences. Our schedules have better execution times than the fastest previously published parallel schedules for p ? m 1. In particular, the exact schedules achieve an execution time of (2m 2 + 3m)N p + (m(m+1)(2m+1)) 2..

CiteSeerX

Crossref

Recommended from our members

Speedup of banded linear recurrences in the presence of resource constraints

Author: Nicolau Alexandru
Wang Haigeng
Publication venue: eScholarship, University of California
Publication date: 24/12/1991
Field of study

An m-th order linear recurrence system of N equations computes Xi =Ci+ L:!~f-m aijXj for 1 ::; i ::; N. Linear recurrences have a role of central importance in computer design, numerical analysis, program analysis, image processing and vision. However, programs containing banded linear recurrences are difficult to parallelize due to loop-carried dependences. In this paper, we first present a family of schedules, called the exact schedules, for parallel evaluation of low order ( m ::; 2) banded linear recurrences with an execution time (2m2 + 3m)N/(p + (m(m + 1)(2m + 1))/(2(2 + llog mj))) for 0 < m::; 2 , N > (p + 5)(2p + 3)/6 and number of processors p > m. We show that the exact schedules achieve the strict time lower bound under matrix multiplication model. Next, we derive another family of schedules, called the pipelined schedules, with better program-space efficiency and with an execution time of pipeline startup time+ (2m2 + 3m)N /(p + (2m + 1)/2) for m = 1, and pipeline startup time+ (6m + 2)N /(p + (2m + 1) for m > 1, m < p ::; 4m + 1, and pipeline startup time+(2m2 +3m)N/(p+(m-1)(2m+1)) form> l,p > 4m+l. This is the first parallel algorithm that achieves this time bound, which improved the fastest prevoiusly published algorithm by a factor 2:'. (p+ 2m2 - m-1)/(p+ m + 1/2) form> 1. We illustrate the technique by parallelizing loops containing linear recurrences and demonstrate the available speedup on a VLIW architecture with experimental results obtained using our pipelined schedules

eScholarship - University of California

Computing Programs Containing Band Linear Recurrences on Vector Supercomputers

Author: Alexandru Nicolau
Haigeng Wang
Publication venue
Publication date: 01/01/1992
Field of study

Many large-scale scientific and engineering computations, e.g., some of the Grand Challenge problems [1], spend a major portion of execution time in their core loops computing band linear recurrences (BLR's). Conventional compiler parallelization techniques [4] cannot generate scalable parallel code for this type of computation because they respect loop-carried dependences (LCD's) in programs and there is a limited amount of parallelism in a BLR with respect to LCD's. For many applications, using library routines to replace the core BLR requires the separation of BLR from its dependent computation, which usually incurs significant overhead. In this paper, we present a new scalable algorithm, called the Regular Schedule, for parallel evaluation of BLR's. We describe our implementation of the Regular Schedule and discuss how to obtain maximummemory throughput in implementing the schedule on vector supercomputers. We also illustrate our approach, based on our Regular Schedule, to parallel..

CiteSeerX

eScholarship - University of California

Recommended from our members

Harmonic scheduling of linear recurrences in digital filter design

Author: Dutt Nikil
Nicolau Alexandru
Wang Haigeng
Publication venue: eScholarship, University of California
Publication date: 14/02/1992
Field of study

eScholarship - University of California

Recommended from our members

The strict time lower bound and optimal schedules for parallel prefix with resource constraints

Author: Nicolau Alexandru
Siu Kai-Yeung S.
Wang Haigeng
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

Parallel prefix is a fundamental common operation at the core of many important applications, e.g., the Grand Challenge problems, circuit design, digital signal processing, graph optimizations, and computational geometry. Given x1 , ... , XN, parallel prefix computes x_1 o x_2 o ... ox_k, for 1 p(p+ 1)/2, we derive Harmonic Schedules and show that the Harmonic Schedules achieve the strict optimal time (steps), |2(N - 1)/(p + l)|. We also derived Pipelined Schedules, optimal schedules with |2( N - 1 )/(p + 1 )| + |(p - 1)/2| - 1 time, which take a constant overhead of |(p - 1)/2| time steps more than the strict optimal time but have the smallest loop body. Both the Harmonic Schedules and the Pipelined Schedules are simple, concise, with nice patterns of computation organizations, and easy to program. For prefix of N elements on p processors in

eScholarship - University of California

High-Level Synthesis of Scalable Architectures for IIR Filters Using Parameterized MCM's

Author: Alexandru Nicolau
Haigeng Wang
Nikil Dutt
Publication venue
Publication date
Field of study

We describe the high-level synthesis of scalable 1 parallel architectures implementing infiniteimpulse response (IIR) filters using multi-chip module (MCM). Our approach is based on a new class of parallel schedules for computing mth-order IIR filters, called regular schedules. The simplicity of the regular schedules facilitates characterization of their inter-processor communications, which is generally difficult to express for parallel algorithms. The characterization of inter-processor communications of the regular schedules enables us to generate instruction-level behavior of the design that can be easily mapped onto MCM-based architectures. We illustrate the use of the regular schedule in algorithmic-level synthesis of MCM-based parallel application-specific processors implementing the fifth-order elliptic wave filter benchmark. Our approach yields a scalable performance measured in the filter's sample rate on both multiple-bus architectures and mesh architectures, which is not ..

CiteSeerX

High-Level Synthesis of Scalable Architectures for IIR Filters Using Multichip Modules

Author: Alexandru Nicolau
Haigeng Wang
Kai-yeung Sunny Siuy
Nikil Dutt
Publication venue
Publication date: 01/01/1992
Field of study

We present a new technique for the high-level synthesis of scalable 1 MCM-based architectures implementing infiniteimpulse response(IIR) filters. Our technique is based on the regular schedules, a class of parallel schedules for computing mth-order IIR filters. The simplicity of the regular schedules facilitates characterization of their inter-processor communications, which is generally difficult to express for parallel algorithms. The characterization of inter-processor communications of the regular schedules enables us to generate instruction-level behavior of the design that can be easily mapped onto MCMbased architectures. We illustrate this mapping of the regular schedules onto an MCM-based architecture by designing a special-purpose processor for the fifth-order elliptic wave filter. Our design yields a scalable performancemeasured in the filter's sample rate, which is not known to have been achieved by previously published designs. This work differs significantly from "trad..

CiteSeerX

eScholarship - University of California